On estimating the useful work distribution of parallel programs under P3T: a static performance estimator

نویسنده

  • Thomas Fahringer
چکیده

In order to improve a parallel program's performance it is critical to evaluate how even the work contained in a program is distributed over all processors dedicated to the computation. Traditional work distribution analysis is commonly performed at the machine level. The disadvantage of this method is that it cannot identify whether the processors are performing useful or redundant (replicated) work. This paper describes a novel method of statically estimating the useful work distribution of distributed memory parallel programs at the program level, which carefully distinguishes between useful and redundant work. The amount of work contained in a parallel program, which correlates with the number of loop iterations to be executed by each processor, is estimated by accurately modeling loop iteration spaces, array access patterns and data distributions. A cost function deenes the useful work distribution of loops, procedures and the entire program. Lower and upper bounds of the described parameter are presented. The computational complexity of the cost function is independent of the program's problem size, statement execution and loop iteration counts. As a consequence, estimating the work distribution based on the described method is considerably faster than simulating or actually executing the program. Automatically estimating the useful work distribution is fully implemented as part of the P 3 T, which is a static parameter based performance prediction tool under the Vienna Fortran Compilation System (VFCS). The Lawrence Livermore Loops are used as a test-case to verify the approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P3T: An Automatic Performance Estimator for Parallel Programs

The area of parallelizing compilers for distributed memory multicomputers has seen considerable research activity during the last few years. Most of the current compilers do not provide any support for estimating performance impacts of code changes that they apply. In this paper, we present P 3 T, which is a static and automatic performance estimator for data parallel programs. It computes at c...

متن کامل

Estimating a Bounded Normal Mean Under the LINEX Loss Function

Let X be a random variable from a normal distribution with unknown mean θ and known variance σ2. In many practical situations, θ is known in advance to lie in an interval, say [−m,m], for some m > 0. As the usual estimator of θ, i.e., X under the LINEX loss function is inadmissible, finding some competitors for X becomes worthwhile. The only study in the literature considered the problem of min...

متن کامل

P3T+: A performance estimator for distributed and parallel programs

Developing distributed and parallel programs on today’s multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P 3T+ which is a performance estimator for distributed and parallel programs. ...

متن کامل

Evaluation of P3T+: A Performance Estimator for Distributed and Parallel Applications

Applications T. Fahringery A. Požgajy J. Luitz H. Moritschz yInstitute for Software Technology and Parallel Systems, University of Vienna Liechtensteinstrasse 22, A-1092, Vienna, Austria [tf,alex]@par.univie.ac.at zDepartment of Business, University of Vienna Brünner Strasse 72, A-1210 Vienna, Austria [email protected] Institute of Physical and Theoretical Chemistry, Vienna Uni...

متن کامل

On Using Volume Computation to Estimate the Work Distribution for Parallel Programs

In this paper we describe a performance parameter which models the work contained in a parallel program and the corresponding work distribution. The work distribution is modeled at the program level which carefully distinguishes between useful and redundant work. We achieve high accuracy due to aggressive exploitation of compiler knowledge such as loop iteration spaces, array access patterns an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Concurrency - Practice and Experience

دوره 8  شماره 

صفحات  -

تاریخ انتشار 1996